Restructuring Databases for Knowledge Discovery by Consolidation and Link Formation
نویسندگان
چکیده
Databases often inaccurately identify entities of interest. Two operations, consolidation and link formation, which complement the usual machine learning techniques that use similarity-based clustering to discover classifications, are proposed as essential components of KDD systems for certain applications. Consolidation relates identifiers present in a database to a set of real world entities (RWE's) which are not uniquely identified in the database. Consolidation may also be viewed as a transformation of representation from the identifiers present in the original database to the RWE's. Link formation constructs structured relationships between consolidated RWE's through identifiers and events explicitly represented in the database. An operational knowledge discovery system which identifies potential money laundering in a database of large cash transactions implements consolidation and link formation. Consolidation and link formation are easily implemented as index creation in relational database management systems.*
منابع مشابه
Restructuring Transactional Data for Link Analysis in the FinCEN AI System
Due to the nature and costs of data collection, many realworld databases consist of large numbers of independent transactions. Finding evidence of structured groups of entities reflected in this data is a task aptly suited to Link Analysis. However, the databases usually must be restructured to allow effective search and analysis of the linkage structures hidden in the original transactions. Th...
متن کاملبررسی کاربردهای داده کاوی در نظام سلامت
Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...
متن کاملCross Border Mergers and Acquisitions by Indian firms-An Analysis of Pre and Post Merger performance
The corporate sector all over the world is restructuring its operations through different types of consolidation strategies like mergers and acquisitions in order to face challenges posed by the new pattern of globalization, which has led to the greater integration of national and international markets.. The intensity of cross-border operations recorded an unprecedented ...
متن کاملPreprocessing and Integration of Data from Multiple Sources for Knowledge Discovery
The explosive growth in the generation and collection of data has generated an urgent need for a new generation of techniques and tools that can assist in transforming these data intelligently and automatically into useful knowledge. Knowledge discovery is an emerging multidisciplinary field that attempts to fulfill this need. Knowledge discovery is a large process that includes data selection,...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کامل